skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Trivedi, Ashutosh"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available November 21, 2026
  2. Free, publicly-accessible full text available September 15, 2026
  3. This paper investigates the performance of a diverse set of large language models (LLMs) including leading closed-source (GPT-4, GPT-4o mini, Claude 3.5 Haiku) and open-source (Llama 3.1 70B, Llama 3.1 8B) models, alongside the earlier GPT-3.5 within the context of U.S. tax resolutions. AI-driven solutions like these have made substantial inroads into legal-critical systems with significant socio-economic implications. However, their accuracy and reliability have not been assessed in some legal domains, such as tax. Using the Volunteer Income Tax Assistance (VITA) certification tests—endorsed by the US Internal Revenue Service (IRS) for tax volunteering—this study compares these LLMs to evaluate their potential utility in assisting both tax volunteers as well as taxpayers, particularly those with low and moderate income. Since the answers to these questions are not publicly available, we first analyze 130 questions with the tax domain experts and develop the ground truths for each question. We then benchmarked these diverse LLMs against the ground truths using both the original VITA questions and syntactically perturbed versions (a total of 390 questions) to assess genuine understanding versus memorization/hallucinations. Our comparative analysis reveals distinct performance differences: closed-source models (GPT-4, Claude 3.5 Haiku, GPT-4o mini) generally demonstrated higher accuracy and robustness compared to GPT-3.5 and the open-source Llama models. For instance, on basic multiple-choice questions, top models like GPT-4 and Claude 3.5 Haiku achieved 83.33% accuracy, surpassing GPT-3.5 (54.17%) and the open-source Llama 3.1 8B (50.00%). These findings generally hold across both original and perturbed questions. However, the paper acknowledges that these developments are initial indicators, and further research is necessary to fully understand the implications of deploying LLMs in this domain. A critical limitation observed across all evaluated models was significant difficulty with open-ended questions, which require accurate numerical calculation and application of tax rules. We hope that this paper provides a means and a standard to evaluate the efficacy of current and future LLMs in the tax domain. 
    more » « less
    Free, publicly-accessible full text available July 8, 2026
  4. Free, publicly-accessible full text available June 12, 2026
  5. Free, publicly-accessible full text available June 1, 2026
  6. Data-driven software is increasingly being used as a critical component of automated decision-support systems. Since this class of software learns its logic from historical data, it can encode or amplify discriminatory practices. Previous research on algorithmic fairness has focused on improving “average-case” fairness. On the other hand, fairness at the extreme ends of the spectrum, which often signifies lasting and impactful shifts in societal attitudes, has received significantly less emphasis. Leveraging the statistics of extreme value theory (EVT), we propose a novel fairness criterion called extreme counterfactual discrimination (ECD). This criterion estimates the worst-case amounts of disadvantage in outcomes for individuals solely based on their memberships in a protected group. Utilizing tools from search-based software engineering and generative AI, we present a randomized algorithm that samples a statistically significant set of points from the tail of ML outcome distributions even if the input dataset lacks a sufficient number of relevant samples. We conducted several experiments on four ML models (deep neural networks, logistic regression, and random forests) over 10 socially relevant tasks from the literature on algorithmic fairness. First, we evaluate the generative AI methods and find that they generate sufficient samples to infer valid EVT distribution in 95% of cases. Remarkably, we found that the prevalent bias mitigators reduce the average-case discrimination but increase the worst-case discrimination significantly in 35% of cases. We also observed that even the tail-aware mitigation algorithm—MiniMax-Fairness—increased the worst-case discrimination in 30% of cases. We propose a novel ECD-based mitigator that improves fairness in the tail in 90% of cases with no degradation of the average-case discrimination. We hope that the EVT framework serves as a robust tool for evaluating fairness in both average-case and worst-case discrimination. 
    more » « less
    Free, publicly-accessible full text available April 26, 2026
  7. Amato, Nancy; Driggs-Campbell, Katie; Ekenna, Chinwe; Morales, Marco; O'Kane, Jason (Ed.)
    We present an approach for systematically anticipating the actions and policies employed by oblivious environments in concurrent stochastic games, while maximizing a reward function. Our main contribution lies in the synthesis of a finite information state machine (ISM) whose alphabet ranges over the actions of the environment. Each state of the ISM is mapped to a belief state about the policy used by the environment. We introduce a notion of consistency that guarantees that the belief states tracked by the ISM stays within a fixed distance of the precise belief state obtained by knowledge of the full history. We provide methods for checking consistency of an automaton and a synthesis approach which, upon successful termination, yields an ISM. We construct a Markov Decision Process (MDP) that serves as the starting point for computing optimal policies for maximizing a reward function defined over plays. We present an experimental evaluation over benchmark examples including human activity data for tasks such as cataract surgery and furniture assembly, wherein our approach successfully anticipates the policies and actions of the environment in order to maximize the reward. 
    more » « less
  8. Efficient verification algorithms for neural networks often depend on various abstract domains such as intervals, zonotopes, and linear star sets. The choice of the abstract domain presents an expressiveness vs. scalability trade-off: simpler domains are less precise but yield faster algorithms. This paper investigates the hexatope and octatope abstract domains in the context of neural net verification. Hexatopes are affine transformations of higher-dimensional hexagons, defined by difference constraint systems, and octatopes are affine transformations of higher-dimensional octagons, defined by unit-two-variable-per-inequality constraint systems. These domains generalize the idea of zonotopes which can be viewed as affine transformations of hypercubes. On the other hand, they can be considered as a restriction of linear star sets, which are affine transformations of arbitrary H-Polytopes. This distinction places hexatopes and octatopes firmly between zonotopes and linear star sets in their expressive power, but what about the efficiency of decision procedures? An important analysis problem for neural networks is the exact range computation problem that asks to compute the exact set of possible outputs given a set of possible inputs. For this, three computational procedures are needed: (1) optimization of a linear cost function; (2) affine mapping; and (3) over-approximating the intersection with a half-space. While zonotopes allow an efficient solution for these approaches, star sets solves these procedures via linear programming. We show that these operations are faster for hexatopes and octatopes than they are for the more expressive linear star sets by reducing the linear optimization problem over these domains to the minimum cost network flow, which can be solved in strongly polynomial time using the Out-of-Kilter algorithm. Evaluating exact range computation on several ACAS Xu neural network benchmarks, we find that hexatopes and octatopes show promise as a practical abstract domain for neural network verification. 
    more » « less
  9. Hillston, Jane; Soudjiani, Sadegh (Ed.)
    We study the problem of inferring the discount factor of an agent optimizing a discounted reward objective in a finite state Markov Decision Process (MDP). Discounted reward objectives are common in sequential optimization, reinforcement learning, and algorithmic game theory. The discount factor is an important parameter used in formulating the discounted reward. It captures the “time value” of the reward- i.e., how much reward at hand would equal a promised reward at a future time. Knowing an agent’s discount factor can provide valuable insights into their decision-making, and help predict their preferences in previously unseen environments. However, pinpointing the exact value of the discount factor used by the agent is a challenging problem. Ad-hoc guesses are often incorrect. This paper focuses on the problem of computing the range of possible discount factors for a rational agent given their policy. A naive solution to this problem can be quite expensive. A classic result by Smallwood shows that the interval [0, 1) of possible discount factor can be partitioned into finitely many sub-intervals, such that the optimal policy remains the same for each such sub-interval. Furthermore, optimal policies for neighboring sub-intervals differ for a single state. We show how Smallwood’s result can be exploited to search for discount factor intervals for which a given policy is optimal by reducing it to polynomial root isolation. We extend the result to situations where the policy is suboptimal, but with a value function that is close to optimal. We develop numerical approaches to solve the discount factor elicitation problem and demonstrate the effectiveness of our algorithms through some case studies. 
    more » « less
  10. Endriss, Ulle; Melo, Francisco (Ed.)
    Alternating-time temporal logic (ATL) extends branching time logic by enabling quantification over paths that result from the strategic choices made by multiple agents in various coalitions within the system. While classical temporal logics express properties of “closed” systems, ATL can express properties of “open” systems resulting from interactions among several agents. Reinforcement learning (RL) is a sampling-based approach to decision-making where learning agents, guided by a scalar reward function, discover optimal policies through repeated interactions with the environment. The challenge of translating high-level objectives into scalar rewards for RL has garnered increased interest, particularly following the success of model-free RL algorithms. This paper presents an approach for deploying model-free RL to verify multi-agent systems against ATL specifications. The key contribution of this paper is a verification procedure for model-free RL of quantitative and non-nested classic ATL properties, based on Q-learning, demonstrated on a natural subclass of non-nested ATL formulas. 
    more » « less